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Abstract Low-rank matrix is desired in many machine 
learning and computer vision problems. Most of the re¬ 
cent studies use the nuclear norm as a convex surro¬ 
gate of the rank operator. However, all singular values 
are simply added together by the nuclear norm, and 
thus the rank may not be well approximated in prac¬ 
tical problems. In this paper, we propose to use a log- 
determinant (LogDet) function as a smooth and closer, 
though non-convex, approximation to rank for obtain¬ 
ing a low-rank representation in subspace clustering. 
Augmented Lagrange multipliers strategy is applied to 
iteratively optimize the LogDet-based non-convex ob¬ 
jective function on potentially large-scale data. By mak¬ 
ing use of the angular information of principal direc¬ 
tions of the resultant low-rank representation, an affin¬ 
ity graph matrix is constructed for spectral clustering. 
Experimental results on motion segmentation and face 
clustering data demonstrate that the proposed method 
often outperforms state-of-the-art subspace clustering 
algorithms. 

Keywords Matrix rank approximation • Subspace 
clustering • Nuclear norm • Log-determinant • Low-rank 
representation • Angular information • Segmentation 

1 Introduction 

Matrix rank minimizing |1] is ubiquitous in machine 
learning, computer vision, control, signal processing and 
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system identification. For instance, low-rank represen¬ 
tation based subspace clustering mm and matrix com¬ 
pletion mm methods have achieved great success re¬ 
cently. Subspace clustering [7] is one of the fundamen¬ 
tal topics with numerous applications, e.g., image rep¬ 
resentation 151151 - face clustering mm , and motion seg¬ 
mentation mm- It is assumed that high-dimensional 
data is more likely a union of low-dimensional subspaces 
rather than one individual subspace. For example, dif¬ 
ferent subspaces are needed to describe trajectories of 
different moving objects in a video sequence. Subspace 
clustering is an intrinsically difficult problem, since we 
need to simultaneously cluster all data points into mul¬ 
tiple groups and find a low-dimensional subspace fitting 
each group of points. 

Subspace clustering has been an active research topic 
over the past decades. Four main categories of meth¬ 
ods are proposed m- iterative, algebraic, statistical, 
and spectral clustering-based methods. The first three 
kinds of approaches are sensitive to initialization, noise 
and outliers; in addition, they are difficult to optimize 
m- Spectral clustering-based methods have achieved 
promising performance, where the key is to learn a good 
affinity matrix of data points. For instance, the algo¬ 
rithms of local subspace affinity (LSA) |13j . locally lin¬ 
ear manifold clustering (LLMC) [14] , and spectral local 
best-fit flats (SLBF) [T5], use local information around 
each point to construct the affinity matrix, while spec¬ 
tral curvature clustering (SCC) [TB. method preserves 
the global structures of the whole data set in deriving 
the affinity matrix. Subsequently, K-means m or Nor¬ 
malized Cuts (NCut) (T2IIH] are applied to the affinity 
matrix to obtain clustering results. 

Recently, some spectral clustering based methods, 
such as sparse representation (SSC) [10) . low-rank rep¬ 
resentation (LRR) |3], have been proposed to obtain 
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state-of-the-art results in subspace clustering. SSC rep¬ 
resents each data point as a sparse linear combination 
of the other points and solves an /|-norm regularized 
minimization problem for sparsity. SSC shows promis¬ 
ing results if the subspaces are either independent or 
disjoint |2.() . 

The basic idea of LRR is to learn a low-rank rep¬ 
resentation of data by capturing the global Euclidean 
structure of the whole data. In this scheme, each data 
point is represented as a linear combination of the ex¬ 
amples in the data matrix itself, and a convex nuclear 
norm minimization is used as a surrogate of the rank 
function to obtain the desired low-rank representation. 
Though its optimization is well-studied and has a global 
optimum, its performance may be far from optimal in 
real applications because the nuclear norm might not be 
a good approximation to the rank function. Compared 
to the rank function to which all nonzero singular val¬ 
ues have equal contributions, the nuclear norm treats 
those values differently by simply adding them together. 
As a result, the nuclear norm may be dominated by a 
few very large singular values and significantly deviated 
from the true value of the rank. Several papers have 
considered this problem of using the nuclear norm and 
designed methods to alleviate it by either thresholding 
or removing some of the singular values; for instance, 
singular value thresholding EH and truncated nuclear 
norm [6] both considerably enhance the performance of 
matrix completion. 

In this paper, we propose to use a log-determinant 
(LogDet) function for rank approximation and study 
its minimization in subspace clustering. Different from 
the nuclear norm-based approaches which minimize the 
summation of all singular values, our approach aims to 
minimize the rank by making the contribution to be 
much closer to one from a big singular value, while zero 
from a small singular value. In this way, we can get 
closer and more robust approximation to the rank func¬ 
tion than the nuclear norm. Since the LogDet function 
is non-convex, we apply the method of augmented La¬ 
grange multipliers (ALM) to solve the associated opti¬ 
mization for potentially large-scale applications, in which 
the subproblem for minimizing the LogDet function in 
each iteration has a closed-form solution. To demon¬ 
strate the effectiveness of our LogDet minimization 
method, we apply it to subspace clustering. By employ¬ 
ing a rather simple formulation based on the LogDet 
function, we obtain a low-rank representation for sub¬ 
space clustering. Subsequently, we exploit the angular 
information of principal directions of such a represen¬ 
tation to further enhance the separation ability of the 
affinity matrix. In summary, our main contributions of 
this work include: 


— More accurate and robust rank approximation is 
used to obtain the low-rank representation, which is 
able to capture the global structure of the dataset. 

— An iterative optimization algorithm is designed for 
minimizing this rank approximation-based objective 
function. Theoretical analysis shows that our algo¬ 
rithm converges to a stationary point. Specifically, 
the proposed optimization method is applied to sub¬ 
space clustering. 

— Angular information of principal directions of the 
low-rank representation is employed to further ex¬ 
ploit the intrinsic local geometrical structure rele¬ 
vant to the membership of data points. 

— Extensive experiments demonstrate the effectiveness 
of the proposed LogDet minimization method for 
rank approximation. Especially, when used for sub¬ 
space clustering, our simple formulation shows fa¬ 
vorable performance compared to other state-of-the- 
art methods, although we do not explicitly account 
for outliers in our model. This demonstrates the ro¬ 
bustness of our approach. 

The remainder of the paper is organized as follows: Sec¬ 
tion [2] provides a brief review of LRR and SSC. In Sec¬ 
tion [3] we present the proposed approximation and de¬ 
sign an efficient optimization scheme. We give conver¬ 
gence analysis in Section [4] Experimental results are 
shown in Section [5] Finally, conclusions are drawn in 
Section EH 

2 Review of LRR and SCC 

In this section, we give a brief review of SSC and LRR. 

Let X = [x\, X 2 , x n ] £ TZ dxn be a set of d-dimensional 

data points drawn from an unknown union of k linear 
subspaces Sj , S 2 , ■■■, <SV The task of subspace clustering 
is to segment data points into k subspaces. 

LRR tries to seek the lowest rank representation 
among many possible linear combinations of the bases 
in a given dictionary, which typically is the data matrix 
itself. The problem can be formulated as: 

min rank(Z) s.t. X = XZ , (1) 

where Z = [z\, Z 2 , ■■■, z n \ is the coefficient matrix with 
each Zi being the representation of Xi. The above prob¬ 
lem is NP-hard due to the combinatorial nature of the 
rank function. 

The tightest convex relaxation of the rank function 
\TI\ is the nuclear norm. For a matrix D £ TZ mxn , its 
nuclear norm is defined as ||-D||* = 
where Ui(D) means the z-th singular value of D. Using 
this relaxation, LRR solves the following problem: 

min ||zj||* s.t. X = XZ. 


( 2 ) 
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After obtaining Z, the affinity matrix W is defined as 3.1 LogDet rank minimization 


W=\Z\+\Z T \. (3) 

Then the spectral clustering algorithm, Normalized Cuts 
m is used to produce the final segmentation. 

SSC aims to find a sparse representation of X by 
solving the following convex optimization problem: 

mmJZlU + ^Efr+^Sh, 

Zi,h/,b Z M) 

s.t. X = XZ + E + S , diag(Z) = 0, 

where ||S'||i = yb ■ |j|, S' is a sparse matrix containing 
the gross error, and ||_E7||^, = y ■ y. Efj, E is a matrix 
of fitting residuals. After obtaining Z , subsequent pro¬ 
cedures are similar to LRR. 


3 LogDet Rank Approximation and Its 
Minimization Algorithm 

A function / : lZ n —> [— 00 , 00 ] is absolutely symmetric 
if f(x) is invariant under arbitrary permutations and 
sign changes of the elements of x. Based on this function 
f(x), we have the following theorem [23] , 

Theorem 1 Function F : 72" 1 *" 2 —)• 7 Z is unitarily 
invariant if F(X) = f(<j(X)) = / o a(X), where X £ 
IZ ni x" ! whose singular value decomposition is 
X = Udiag({a i } 1 < i < n )V T , cr(X) : U niXn2 -A K n are 
singular values of X, and n = min(ni,ri 2 ). And the 
gradient of F(X) at X is 


dF(X) 

dX 


Udiag[0)V T , 


( 5 ) 


where 0 


■ 

dy \y=<r(X)- 


Equation ([ 5 J) can be obtained directly from Theorem 
3.1 of [H], 

In this work, we utilize unitarily invariant function 
LogDet to achieve a closer, though not convex, rank re¬ 
laxation than the nuclear norm. We apply the method 
of ALM for LogDet rank approximation associated min¬ 
imization. To explain our method, we specifically con¬ 
sider using LogDet as a rank surrograte in subspace 
clustering. We first obtain a low-rank representation of 
high-dimensional data based on the LogDet optimiza¬ 
tion. Then we construct an affinity graph matrix for 
spectral clustering by using the angular information of 
principal directions of the low-rank representation. 


We use log det(I + Z T Z) as a surrogate of the rank 
function of Z. It is obvious that log det(I + Z T Z) = 
y” =1 log(l + of (Z)). Because it can be easily verified 
that log(l + of (Z)) < <Ji(Z) for any tTj(Z) > 0, we al¬ 
ways have \ogdet(I + Z T Z) < HZ)!*; especially, if there 
are large nonzero singular values, the LogDet function 
will be much smaller than the nuclear norm since log(l+ 
of (Z)) -C cri(Z) for a large eq(Z) > 1. It is noted that 
for small nonzero singular values, their contribution to 
the LogDet function will be significantly reduced com¬ 
pared to the nuclear norm. Because small nonzero sin¬ 
gular values are often regarded as being from noise in 
the data, the LogDet function reduces noise effect more 
compared to the nuclear norm. 

It is worthwhile to note that a similar function 
log det{X + SI) was proposed in [S3] to approximate 
rank and iterative linearization was used to find a local 
minimum. However, S is a very small constant (e.g., 
10 -6 ), which leads to biased approximation for small 
singular values. 

This LogDet function is differentiable with respect 
to the singular values by Theorem 1, and even though it 
is non-convex, its minimization is rather simple by using 
our optimization method. To explain its minimization, 
we consider its specific application to subspace cluster¬ 
ing. By employing the above LogDet function, we sim¬ 
ply formulate the subspace clustering into the following 
unconstrained nonconvex minimization problem: 

nun log det(I + Z T Z) + p || A' — XZ\\ 2 p , (6) 

where I £ 7 Z nxn is the identity matrix. The first term 
of (|6| is to minimize the rank of Z, while the second is a 
relaxation of X = XZ, which is referred to as the self¬ 
expressiveness of X with Z representing the similarity 
between data points. Because the LogDet function is 
not convex in Z, we resort to ALM technique to solve 
<§, by re-writing © as follows: 

nun log det{I+Z T Z)+p\\X - XW\\ 2 p s.t. Z = W. (7) 

We turn to minimizing the following augmented La- 
grangian function: 

L(Y, Z, W, /3) = log det(I+ Z T Z)+p\\X - XW\f F 
+ ^\\Z-W\\ 2 F +Tr(Y T (Z-W)), 

( 8 ) 

where /3 > 0 is a penalty parameter and Y is the La- 
grangian dual variable. With a sufficiently large f3, the 
objective function converges to objective function in 
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Algorithm 1 : LogDet Rank Minimization 

Input: data matrix X, parameters p > 0,7 > Rand /3o > 0. 

Initialize: Z = ie K" x ", 7 = 0. 

Repeat 

1: Update W as: 

W k+1 = {fal + 2pX T X)~ 1 (2pX T X + Y k + /3 k Z k+1 ). 


is Z* = UE* Z V T , with E* z = diag{{a*}™\ (rn ' n) ) ob¬ 
tained by solving scalar minimization problems 


a* = argmin/(cr i )+-((T i —o-j A ) 2 , i = 1, 


, mm(m, n). 


(13) 


2: Solve Z using |lTJ and ( |23| . 

3: Update the augmented multiplier Y and the augmented 
Lagrange multiplier /3: 

Y k +1 = Y k + fa{Z k+1 - W k+1 ), 

Pk + l = 7 Pk- 

Until stopping criterion is satisfied. 

Return Z* = Z k+1 . 


(|Gj). This can be solved by updating Z, W, and Y al¬ 
ternatively while fixing the other variables. Specifically, 
assume at the fcth iteration we have obtained Z k ,W k , 
and Y k , then for the (k + l)th iteration, the optimiza¬ 
tion problem |8| can be updated via the following four 
steps. 

Step 1: Computing W k+1 . Fix Z k and Y k and then 
calculate W k+1 \ 


W k+1 = argminp || X — XIV|| F + 


fa 

T 



(W 



( 9 ) 


which has a closed-form solution, 


W k+1 = {faI+2pX T X)~ 1 {2pX T X+Y k +faZ k ). (10) 


Step 2: Computing Z k+1 . Fix W k+1 and Y k , and 
minimize L(Y k , Z,W k+1 , fa) as follows: 

Z k+1 = argimn L(Y k , Z, W k+ \ fa) 


= argimn 


log det{I + Z T Z)+ 


fa 

2 


Z-(W k+1 



( 11 ) 


Proof Let A = US A V T be SVD of A, then S A = 
U T AV. Denoting X = U T ZV which has exactly the 
same signular values as Z, i.e., Ex = Ez, we have 


F(Z) + ^\\Z - A\\ 2 f 

= f (X) + ^\\X - E a \\ 2 f , 

= F(E x ) + ^\\X - E a \\ 2 f , 

= F(S X ) + \ (||A1 2 f + \\E a \\ 2 f - 2 (X , E a )) , 
> F(Zx) + ^ (\\Ex\\ 2 f + \\E a \\ 2 f -2{E x ,E a )) , 


= F(E X ) + ^\\E X - E A f F , 
= F(E z ) + ^\\E z -E a \\ 2 f , 

(3 

+ -fai- * itA y 


= E 


> _ ff a) 2 - 


(14) 

(15) 

(16) 

(17) 

(18) 

(19) 

( 20 ) 
( 21 ) 

( 22 ) 


In the above, (151 holds because the Frobenius norm is 
unitary invariant; (161 holds because F(Z) is unitary 
invariant; (171 is true by von Neumann’s inequality; 
and (20) holds as E x = E z . The inequality between 
(151 and (191 can also be obtained by the Hoffman- 
Wielandt inequality. Therefore, (201 is a lower bound 
of (141, where E z is obtained by minimizing (20). Note 
that the equality in (181 is attained if X = E x . Be¬ 
cause E z = E x = X = U T ZV , the SVD of Z is 
Z = UE Z V T , which is the minimizer of problem (12). 
Hence the proof is completed. 


This can be converted to a scalar minimization problem 
due to the following theorem. As we notice, this can also 
be rewritten as s special case of the problem in a recent 
work [25j. 

Theorem 2 For unitarily invariant function F(Z) = 
foa(Z), assuming SVD of A £ 7^ mxri is A = UE A V T , 
E a = diag({<Ti' A }Ff 0 pH ma i solution to the 
following problem 

minF(Z) + ^\\Z — A\\ 2 F (12) 


The first-order optimality condition is that the gra¬ 
dient of (131 with respect to each singular value should 
vanish. Thus for subproblem (111, we have 


+ YY ~ F k ) = 0, s.t. Gi > 0, for i = 1, 

(23) 

where SVD of W k+1 - A-Y k is Udiag({E k } 1 f =1 )V T . 
The above equation is cubic and gives three roots. In 
addition, we need to enforce the nonnegativity of a j. It 
is easily seen that there exists at least one nonnegative 
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root. And there is a unique minimizer < 7 * £ [0, Y k ) if 
(3 k > 1/4. Finally, we obtain the update of Z variable 
with Z k+1 = Udiag(a k *, ...,a* n )V T . 

Step 3: Computing Y k+1 . Fix Z k+1 and IF fc+1 , and 
then we calculate Y k+1 as follows: 

yfc+i =Y k +p k (Z k+1 -W k+1 ). (24) 

Step 4: Updating (3k+i as (3 k+ i = 7/3fc- The complete 
procedure is summarized in Algorithm 1. 

Problem Q is nonconvex. It is difficult to give a 
rigorous mathematical argument for convergence to a 
(local) optimum. We will provide a theoretical proof 
that our algorithm converges to an accumulation point 
and this accumulation point is a stationary point. Our 
empirical experiments confirm the convergence of the 
proposed method on the benchmark datasets. The ex¬ 
perimental results are promising, despite that the so¬ 
lution obtained by the proposed optimization method 
may be a local optimum. 


Algorithm 2 : The SCLD Algorithm 

Input: data matrix X, number of subspaces fc, parameters 

p > 0, 7 > 1 , and /3o > 0. 

1: Obtain Z* from Algorithm 1. 

2: Compute the skinny SVD Z* = U*E*(V*) T . 

3: Calculate M = [/*(X '*) 1/2 or N = (X'*) 1 / 2 (V'*) T - 
4: Construct the affinity graph matrix W by ( |25| ). 

5: Apply W to perform NCuts. 


coefficients of the examples, because their lengths may 
be affected significantly by the noise or outliers in the 
data. 

Now using the resultant affinity matrix, we can ap¬ 
ply spectral clustering algorithm to do segmentation. In 
this paper, we simply perform NCuts [TS) on W. The 
proposed subspace clustering procedure is summarized 
in Algorithm 2. 

4 Convergence Analysis 


3.2 Affinity graph matrix construction 


Now we will construct an affinity matrix W for subspace 
clustering. Optimal Z* may not accurately describe the 
relationship between samples if the data is severely cor¬ 
rupted. Therefore, in general, it is not a good idea to 
construct W by directly using Z*. In the spirit of im 
we construct an affinity matrix in the following way. 

Assuming the skinny SVD of Z* is U* Y* (V*) T , we 
define M = U*(Y*) 1 / 2 and N = (Y*) 1 / 2 (V*) T . Based 
on the weighted eigen-vector matrix M or N, we con¬ 
struct an affinity matrix W as follows: 


Wij = ( T 


mj nij 


rrii 


*112 


\m 


r ^ Wij = (t 


nfrij 


\2a 


3 N 2 


b % 112 11' 112 


(25) 


where To; (nf) and mj (rij) represent the *-th and j- 
th columns (rows) of M ( N ), respectively, and param¬ 
eter a £ AT tunes the sharpness of the affinity be¬ 
tween two points, with a > 1 helping separate the 
clusters. When a increases, while the between-cluster 
separability can be increased, the intra-cluster cohe¬ 
siveness would nevertheless be degraded. Thus, a suit¬ 
able a needs to balance within-cluster cohesiveness and 
between-cluster separability. In this paper, we set a to 
be 2. Then we have the same post-processing as LRR0 
As U* or V* spans the principal directions of Z*, we 
employ the angle information, or powered correlation 


In this section, we give the convergence analysis for Al¬ 
gorithm 1. We will show that our optimization algo¬ 
rithm attains at least one stationary point of problem 
0 . We first rewrite the objective function of 0 as 

G(Z, W) = F{Z)+p\\X - XWf F s.t. Z = W, 

(26) 

H(Z,W,Y)=G(Z,W) + {Z-W,Y), (27) 

L(Z, W, Y, (3) = H(Z, W, Y) + ^\\Z- W\\ 2 f . (28) 

Lemma 1 The sequence {Y&} is bounded. 

Proof To minimize Z at step k + 1, the optimal Zk +1 
needs to satisfy the first-order optimality condition 


X z L(Z,W k+1 ,Y k ,p k ) \ Zk+1 
=V Z F (Z) \ Zk+1 + (3 k (z k+1 + j-Y k 


Wfc+i) = 0. 

(29) 


Note that the updating rule for Y is 

Yk+i = Y k + fa (. Z k+1 - W k+ 1 ), (30) 

thus X Z F (Z) \ Zk+1 + Y k+ i = 0. We know from ([ 5 ]) that 


X z F(Z)\ Zk+1 

=udiag ( Jfn 


2 Cfn \ 


v T , 


(31) 


1 For LRR, we use equation (12) in [3] rather than (3) to and g < 2^ < p so y Z F (Z) | Ztx1 is bounded. Then 
construct W. We also confirmed with an author of [3], the _ _ 1 + r7 i + 

power 2 of equation (12) is a typo, it should be 4. ^ is seen that lfc+i, i.e., {F^} is bounded. 
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Lemma 2 {444} and {Z k j are bounded ifY < oo 

k 

and "Yh < oo. 

Proof 

L {Z k , W k , Y k , p k ) 

=L {Z k , W fc , n-i, 4-0 + Pk ~ 2 /jfc ~ 1 1| Z k - W k \\l+ 
Tr((Y k - y fc _i)(Z fc - W fc )) 

=L(Z fcj Wfc.n-uft-!) + lin - lfc-i|| 2 F . 

Z Pk-l 

(32) 

Thus, 

L 444 +1 , p k ) 

<L (Zfc, 444+u Ife, p k ), 

<L(Z k ,W k ,Y k ,p k ), 

<L (Z k , W k , Y k _x, p k _x) + &L±P±±\\Y k - Y k _if F , 

Z Pk-1 


Without loss of generality, we assume that { Y k , W k , .24} 
itself converges to {Y*, W*,Z*j. Next, we prove that 
this accumulation point is a stationary point of prob¬ 
lem ( [26] ) . As Y k = Y fc _i + P k -i(Z k - 144), we have 
Z k - W k = j^(Y k - Y k _ i). Because ,/3 fe _ t -> oo and 
{Y k } is bounded, we get Z k — W k —> 0, i.e., Y* = 
4Y*. By first-order optimality condition and the defi¬ 
nition of Z k , we have X7 z F(Z)\ Zk + Y fc _i + /3 fc —l (Z k - 
W k ) = X z F(Z)\ Zk + Y k = 0. Let k ->• oo, we get 
X/ Z F (Z) \ z * + Y* = 0. At the fcth step, W k satisfies 
(2 pX T X + p k -iI)W k = 2 pX T X + Pk-iZ k _! + Yfc-r, 
i.e., 2pX T X(444 - /) = p k -iZ k -i - /4-i444 + 54-i = 
Pk-i{Z k — W k )+(3 k -i(Z k _i — Z k )+Y k _i = /3fc_i(Yfc_i — 
Yfc)+Yfc. With the assumption that P k _ k {Z k — Z k _\) —► 
0 [2E1, we get 2pX T X{W* — I) =Y*. 

Now we can see that { Y* , W *, Z* } satisfies the KKT 
conditions of L(W,Z,Y) and thus {W*,Z*} is a sta¬ 
tionary point of Q. 


5 Experiments and Analysis 


<... 


<L(Z 1 ,W 1 ,Y o ,0o) + 


k 

E 

i=1 


Wt-i 


\Yi-YiP 


F • 


(33) 


In this section, we conduct experiments on the subspace 
clustering task with both synthetic and real data. 


5.1 Experiments with Synthetic Data 


Since the second term in above inequality is finite, 

L (Z k+ i,W k+ i,Y k , P k ) is bounded. We can rewrite 

72 (-^fc+lt 444+lt Y k j Pk) 

L(z k+ 1 ,w k+1 , Y k ,p k ) + -E||y fc |||. 

z Pk 

=F(Z k+1 ) + p\\X - XW k+1 f F + (34) 

^-\\Zk+i — W k +i + -j-Y k \\p. 


Because L(Z k+1 , W k+1 , Y k ,P k ) and ^||Y fe ||f, are bounded 
and each term on the right hand side of the equa¬ 
tion (341 is nonnegative, each term will be bounded. 
F(Z k+ 1 ) = )>Ylog(l + of(Z fe+ i)) being bounded im¬ 
plies that all singular values of Z k+ i are bounded and 
Z k +1 is bounded. Since ^(Y fc+ i - Y k ) = Z k+1 - W k+1 , 
clearly we have bounded 444- Therefore {444} and {Z k j 
are bounded. 


We construct 5 independent subspaces whose bases {£/»}| =1 
are generated by a random rotation matrix R through 
Ui +1 = RUi, 1 < * < 4, where Ui € 1Z 100x4 is a random 
orthogonal matrix [2] . We sample 20 data vectors from 
each subspace by Xj = UjTj, 1 < j < 5, where Tj is a 
4 x 20 iid Af(Q, 1) matrix. Some data vectors are ran¬ 
domly chosen to corrupt; for example, for a data vector 
x, it is corrupted by adding Gaussian noise with zero 
mean and variance 0.2||rr||. We then use SOLD to seg¬ 
ment the data into 5 clusters. Subspace clustering error 
rate defined as * «ed to assess the 

performance. We report the clustering error rate (aver¬ 
aged from 30 trials) with different corruption levels in 
Figure [T] Without any corruption, SOLD can cluster all 
data points correctly. 


Theorem 3 {Y k ,W k , Z k } has at least one accumula¬ 
tion point {Y*, W*, Z*}, and {W*, Z*j is a stationary 
point of optimization problem 0 with the assumption 
that lim P k -i{Z k - Z k _i) 0. 

k —foo 

Proof [Y k ,W k , Z k } is a bounded sequence, hence by 
the Bolzano-Weierstrass theorem, there must be at least 
one accumulation point, which is denoted by {Y*, W*, Z* 


5.2 Experiments with Real Data 

In this section, we evaluate the effectiveness and robust¬ 
ness of SOLD on benchmark datasets, Extended Yale 
B (EYaleB) |27lf28] and Hopkins 155 [22. We compare 
the proposed method SCLD with several state-of-the- 
art subspace clustering algorithms: LRR [3], SSC WL 
LRSC |30l[3], and local subspace affinity (LSA) |13| . 
For these methods, we use the parameters given by the 
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Table 1: Parameter settings of different algorithms. 


Method 

Face clustering 

Motion segmentation 

Scenario 1 

Scenario 2 


LRR 

A = 

0.18 

A = 4 

LSA 

K = 

3, d = 5 

II 

00 

Si¬ 

ll 

cn 

ssc 

A e — 8/ 

A e = 20/ 

A z = 800/tt z 

LRSC 

T = 0.4, a = 0.045 

r = 0.045, a = 0.045 

r = 420, a = 3000 or a = 5000 

SCLD 

p = 0.08 

p = 0.03 

p = 55 



Fig. 2: Sample images from the Extended Yale B database. 



Fig. 1: The clustering error rate with different 
percentage of corruption on synthetic data. The 
parameter p is tuned to obtain the best performance. 

respective authors. For our method, we also tune p to 
obtain the best performance. Generally, p should be rel¬ 
atively large if the data are slightly corrupted. (3 and 
7 have little influence on the clustering results, so we 
just set /3q = 0.3 to ensure the unique of minimizer and 
use 7 = 1.1 empirically. Other parameters are shown in 
Table [T] The experiments are conducted on Window 7 
with 16 GM memory and Intel Core i5-2300 CPU. 

5.2.1 Face Clustering 

Face clustering is to cluster a set of face images from 
multiple individuals in a hope to reveal the identity 
of these individuals. EYaleB Database includes 2414 
frontal images of 38 individuals. For each individual, 
the images are taken under 64 lighting conditions and 
can be described by a low-dimensional subspace m- 
The images are resized to 48x42 pixels and each vec¬ 


torized image is regarded as a data point. Fig. [2] shows 
some example images from the database. 


Table 2: Clustering error rate on the first 10 classes of 
EYaleB. 


Method 

LRR 

SSC 

LSA 

LRSC 

SCLD 

error rate (%) 

20.94 

35 

59.52 

35.78 

3.59 


5.2.1.1 First Experiment Scenario As done in J2j, we 
test the algorithms on the first 10 classes of EYaleB, 
which consists of 640 frontal face images. More than 
half of the images are corrupted by shadow and noise. 
We use this heavily corrupted data to test the effec¬ 
tiveness of our method. As shown in Table [2j SCLD 
significantly enhances the performance. Specifically, it 
improves the clustering accuracy by at least 17% when 
compared to the other algorithms. Since the only dif¬ 
ference between our approach and LRR is rank approx¬ 
imation, this improvement is due to LogDet. 

5.2.1.2 Second Experiment Scenario For a fair compar¬ 
ison, we have followed the experimental setup of |T0| . 
We divide the 38 subjects into four groups: subjects 1 
to 10, 11 to 20, 21 to 30, and 31 to 38. We consider all 
choices of n £ {2, 3, 5, 8,10} subjects for the first three 
groups. For the last group, we consider all choices of 
n £ {2,3, 5, 8}. We implement our subspace clustering 
algorithm on each set of n subjects. For all experiments, 
the stopping criterion for Z is triggered by a relative dif¬ 
ference of 10~ 5 between two successive iterations, or by 
a maximum of 100 iterations. 

The results are presented in Table[3] For other meth¬ 
ods, we cited the results from Table 5 of paper m- 
SCLD consistently has low clustering error rates and is 
more stable than the other methods whose error rates 
increase drastically as the number of subjects increases 
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Table 3: Clustering error rates (%) on EYaleB. 


Method 

LRR 

SSC 

LSA 

LRSC 

SCLD 

2 Subjects 

Mean 

2.54 

1.86 

32.80 

5.32 

2.79 

Median 

0.78 

0.00 

47.66 

4.69 

0.78 

3 Subjects 

Mean 

4.21 

3.10 

52.29 

8.47 

3.72 

Median 

2.60 

1.04 

50.00 

7.81 

1.56 

5 Subjects 

Mean 

6.90 

4.31 

58.02 

12.24 

4.83 

Median 

5.63 

2.50 

56.87 

11.25 

2.50 

8 Subjects 

Mean 

14.34 

5.85 

59.19 

23.72 

5.45 

Median 

10.06 

4.49 

58.59 

28.03 

3.52 

10 Subjects 

Mean 

22.92 

10.94 

60.42 

30.36 

6.25 

Median 

23.59 

5.63 

57.50 

28.75 

4.84 


to 8 and 10. As shown in Figure[2j there are many sparse 
within-sample outliers in the face images, e.g, shadows. 
Although LRR uses a regularization term to count for 
corruptions, the regularization term does not appear to 
be well suited to EYaleB. LSA has inferior performance 
possibly because it does not explicitly exploit the low- 
rank structure of the data. 

5.2.1.3 Third Experiment Scenario In this section, we 
compare SCLD with other algorithms with RPCA [32] 
as a preprocessing step. In practice, we do not know the 
clustering of the data beforehand and hence we apply 
RPCA to the collection of all data points for each trial 
prior to clustering. As shown in Table [4] SCLD is still 
superior to other methods though they apply RPCA to 
deal with sparse outlying entries. Compared to Table [3] 
only the clustering error rates of LRSC reduced in some 
cases. We can conclude that applying RPCA to all data 
points simultaneously is not effective to improve clus¬ 
tering performance. This is due to the fact that RPCA 
seeks a common low-rank subspace, which will decrease 
the principal angles between subspaces and decrease the 
distance between data points in different subjects m- 

5.2.2 Motion Segmentation 

Motion segmentation is to segment the trajectories as¬ 
sociated with n different moving objects into different 
groups according to their motions in a video sequence. 
Because different motions can be treated as different 
subspaces, we use the Hopkins 155 Dataset to validate 
SCLD. This dataset is slightly corrupted as shown in 
Figure 3. It consists of 155 sequences of two or three 


Table 4: Clustering error rates (%) on EYaleB after 
applying RPCA simultaneously to all the data in each 
trial. 


Method 

LRR 

SSC 

LSA 

LRSC 

SCLD 

2 Subjects 
Mean 

5.72 

2.09 

32.53 

5.67 

2.79 

Median 

3.91 

0.78 

47.66 

4.69 

0.78 

3 Subjects 
Mean 

10.01 

3.77 

53.02 

8.72 

3.72 

Median 

9.38 

2.60 

51.04 

8.33 

1.56 

5 Subjects 
Mean 

15.33 

6.79 

58.76 

10.99 

4.83 

Median 

15.94 

5.31 

56.87 

10.94 

2.50 

8 Subjects 
Mean 

28.67 

10.28 

62.32 

16.14 

5.45 

Median 

31.05 

9.57 

62.50 

14.65 

3.52 

10 Subjects 
Mean 

32.55 

11.46 

62.40 

21.82 

6.25 

Median 

30.00 

11.09 

62.50 

25.00 

4.84 


motions and 1 sequence of 5 motions; the latter is re¬ 
garded as outlier. Each sequence is regarded as a sepa¬ 
rate clustering problem. 

Table 5: Segmentation error rate (%) on the HopKins 
155 Dataset (155 Sequences). 


Method 

LRR 

SSC 

LSA 

LRSC 

SCLD 

2 Motions 

Mean 

2.13 

1.52 

4.23 

3.69 

1.31 

Median 

0.00 

0.00 

0.56 

0.29 

0.00 

3 Motions 

Mean 

4.03 

4.40 

7.02 

7.69 

3.43 

Median 

1.43 

0.56 

1.45 

3.80 

0.56 

All 

Mean 

2.56 

2.18 

4.86 

4.59 

1.79 

Median 

0.00 

0.00 

0.89 

0.60 

0.00 

Time (sec) 

1.30 

1.04 

3.40 

0.16 

1.49 


The experimental results are reported in Table [5] 
We also used the results in Table 1 of m ■ It can be 
seen that SCLD produces superior results compared to 
the other methods. For all 155 sequences, the error rate 
is as low as 1.79%. If we use all 156 sequences, the over¬ 
all error rate of our proposed algorithm will be 1.87%. 
We report the average computation time for every se¬ 
quence at the bottom of Table [5] The computational 
cost of LRSC is much lower than the other methods, 
while LRR, SSC and SCLD are comparable. 

To testify the influence of parameter p in our algo¬ 
rithm, we show the clustering error rates of SCLD for 
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Fig. 3: Example frames from four video sequences of the Hopkins 155 Database with traced feature points. 


.f 

1 

U 



1 10 20 30 40 50 60 70 80 100 120 150 200 

Parameter p 


Fig. 4: Changes in clustering error rate when varying 

P- 


different p over all 155 sequences in Figure[lJ As we can 
see, when p was between 1 and 200, the clustering er¬ 
ror varied between 1.79% and 4.67%. This implies that 
SCLD performs well under a wide range of values of p. 

To test the dependence of SCLD on initialization, 
we apply another two different initializations. First, we 
use the solutions from LRR as initial guess for SCLD. 
Second, we just generate some random numbers. We 
find that we can still get the same results. Actually, it 
is recommended to use convex relaxation solutions as 
initialization for nonconvex formulations G3!33- 


6 Conclusion 

In this paper we propose to use a log-determinant func¬ 
tion (LogDet) as a rank approximation to recover the 
low-rank representation of high-dimensional data. When 
applied to subspace clustering, the proposed algorithm, 
called SCLD, exploits both global and local structures 
of the data through the LogDet rank approximation 
and angle-based affinity matrix. Consequently, it cap¬ 
tures more intrinsic information of the data that bene¬ 
fits subspace clustering. Our extensive experimental re¬ 


sults show that it outperforms other low-rank represen¬ 
tation algorithms based on the nuclear norm. Therefore 
LogDet appears to be an effective rank approximation 
function well suited to subspace clustering applications. 
Although our model is simple and with no explicit mod¬ 
eling of outliers, it is resilient to various corruptions. 
Our future research will consider modeling corruptions 
explicitly. 
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